-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-14123] [SPARK-14384] [SQL] Handle CreateFunction/DropFunction #12117
Conversation
Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
Conflicts: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkQl.scala
…egistry loads permanent function.
Test build #54723 has finished for PR 12117 at commit
|
Test build #54724 has finished for PR 12117 at commit
|
* alias: the class name that implements the created function. | ||
* resources: Jars, files, or archives which need to be added to the environment when the function | ||
* is referenced for the first time by a session. | ||
* isTemp: indicates if it is a temporary function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you follow the javadoc format of the other commands, something like:
* ...
* The syntax of using this command in SQL is:
* {{{
* SHOW FUNCTIONS [LIKE pattern]
* }}}
The command format is pretty useful. Same in DropFunction
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Thanks @yhuai for improving the original changes. This looks great now. |
@@ -208,6 +261,118 @@ class HiveSparkSubmitSuite | |||
} | |||
} | |||
|
|||
// This application is used to test defining a new Hive UDF (with an associated jar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea. This is good!
@@ -201,8 +201,13 @@ class TestHiveContext private[hive]( | |||
} | |||
|
|||
override lazy val functionRegistry = { | |||
new TestHiveFunctionRegistry( | |||
org.apache.spark.sql.catalyst.analysis.FunctionRegistry.builtin.copy(), self.executionHive) | |||
// TestHiveFunctionRegistry tracks removed functions. So, we cannot simply use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I do not fully understand this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
Test build #54931 has finished for PR 12117 at commit
|
Test build #54945 has finished for PR 12117 at commit
|
Test build #54948 has finished for PR 12117 at commit
|
LGTM now. |
Test build #54987 has finished for PR 12117 at commit
|
} | ||
if (!ifExists && !catalog.functionExists(func)) { | ||
throw new AnalysisException( | ||
s"Function '$functionName' does not exist in database '$dbName'.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think this should be handled within dropFunction
itself (by passing the ifExists
flag), but not a big deal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yes. I totally agree. We should do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
## What changes were proposed in this pull request? This is a followup to #12117 and addresses some of the TODOs introduced there. In particular, the resolution of database is now pushed into session catalog, which knows about the current database. Further, the logic for checking whether a function exists is pushed into the external catalog. No change in functionality is expected. ## How was this patch tested? `SessionCatalogSuite`, `DDLSuite` Author: Andrew Or <andrew@databricks.com> Closes #12198 from andrewor14/function-exists.
What changes were proposed in this pull request?
This PR implements CreateFunction and DropFunction commands. Besides implementing these two commands, we also change how to manage functions. Here are the main changes.
FunctionRegistry
will be a container to store all functions builders and it will not actively load any functions. Because of this change, we do not need to maintain a separate registry for HiveContext. So,HiveFunctionRegistry
is deleted.FunctionRegistry
but its metadata is stored in the external catalog. For this case, SessionCatalog will (1) load the metadata from the external catalog, (2) load all needed resources (i.e. jars and files), (3) create a function builder based on the function definition, (4) register the function builder in theFunctionRegistry
.UnresolvedGenerator
is created. So, the parser will not need to callFunctionRegistry
directly during parsing, which is not a good time to create a Hive UDTF. In the analysis phase, we will resolveUnresolvedGenerator
.This PR is based on @viirya's #12036
How was this patch tested?
Existing tests and new tests.
TODOs
[x] Self-review
[x] Cleanup
[x] More tests for create/drop functions (we need to more tests for permanent functions).
[ ] File JIRAs for all TODOs
[x] Standardize the error message when a function does not exist.